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I.  INTRODUCTION 

Considerable  effort  has  been  spent  by  the  Naval  Personnel 
Research  and  Development  Center  (NPRDC) ,  to  develop  a  model 
that  would  enable  the  Navy  to  forecast  future  states  of  the 
enlisted  force  structure.   This  model,  entitled  FAST,  (see  [2], 
[4]  and  [5])  is  a  highly  comprehensive  model  that  involves 
acquisitions,  losses,  and  advancements  as  well  as  a  large  number 
of  subcategories  of  these  variables  of  the  Navy  personnel  force. 
FAST  has  been  used  successfully  in  the  past  few  years  as  a 
long-range  planning  tool  as  well  as  for  researching  the  behavior 
of  the  enlisted  force.   Due  to  the  complexity  of  the  model  its 
operation  requires  a  large  amount  of  data  processing  and  computer 
time. 

In  an  attempt  to  increase  the  flexibility  of  FAST,  this  research 
effort   concentrated  on  a  single  variable  of  the  personnel  force: 
losses.   Since  forecasting  future  losses  is  one  of  the  major  tasks 
of  FAST,  it  was  considered  important  to  attempt  to  simplify  that 
single  aspect  of  FAST. 

II.  THE  FORECASTING  PROBLEM 

The  enlisted  Navy  force  is  organized  and  managed  along  the 
lines  of  ratings,  that  is,  job  skills  within  the  Navy.   Consequently, 
the  job  of  forecasting  losses  must  be  done  for  each  rating  indi- 
vidually.  In  addition,  losses  categorized  by  length  of  service 
and  pay  grade  simultaneously  are  preferred,  so  that  the  effects 
of  projected  losses  on  the  force  structure  can  be  forecast  as  well. 


When  all  of  the  above  variables  are  considered  simultaneously, 
the  population  of  individuals  being  considered  is  greatly 
diminished.   For  example,  while  the  number  of  E-5's  with  15 
years  of  service  may  be  several  hundred,  the  number  of  Electronic 
Technicians  who  are  E-5  with  15  years  service  is  slight. 

This  problem  of  sparse  data  makes  the  task  of  accurate  fore- 
casting difficult.   Procedures  for  forecasting  are  all  predicated 
on  some  statistical  stability  in  people's  actions.   This  stability 
comes  about  with  large  populations  of  individuals  whose  reactions 
are  similar.   With  the  small  populations  that  are  inherent  in 
sparse  data,  the  consequent  lack  of  statistical  stability  makes 
reliable  forecasting  difficult  at  best. 

To  help  overcome  the  problems  caused  by  sparse  data,  the 
populations  can  be  recombined  to  form  fewer  groups  of  larger 
sizes.   A  natural  choice  for  this  combination,  or  pooling  of  data, 
is  along  the  lines  of  ratings.   That  is,  if  ratings  which  exhibit 
similar  loss  behavior  statistically  are  identified  and  grouped, 
or  clustered  together,  the  resulting  clusters  can  be  used  in  place 
of  ratings  to  gain  some  statistical  stability.   The  pooling  of  data 
in  clusters  of  ratings  is  sought  only  to  improve  the  estimates  of 
loss  characteristics  and  of  certain  parameters  in  statistical 
models.   The  forecasting  of  losses  for  each  rating  can  still  be 
accomplished.   This  then  is  one  reason  for  finding  clusters  of 
Navy  ratings  which  exhibit  similar  loss  behavior.   Other  applica- 
tions of  the  clustering  would  be  to  identify  groups  of  ratings 
to  which  common  policies  regarding  loss  and  retention  might  be 
applied.   The  following  sections  of  this  report  describe  approaches 


to  identifying  the  clusters  and  a  procedure  for  estimating  their 
possible  effectiveness  in  improving  forecasts. 

For  the  purpose  of  our  analysis,  losses  were  defined  to  include 
losses  for  all  reasons,  from  all  pay  grades  and  length  of  service 
cells.   Actual  prediction  of  losses  is  more  complex,  involving 
many  variables,  as  described  in  [ 2  ]  and  [ 4  ] . 

III.   HIERARCHICAL  CLUSTERING 

A  ommon  technique  for  clustering  is  the  Hierarchical  clustering 
method.   We  will  give  a  brief  description  of  the  method  here,  Ref 
[1]  provides  more  details. 

The  hierarchical  clustering  approach  groups  objects,  in  our 
case  Navy  ratings,  into  several  sets  of  cli  >ters,  each  one  contained 
in  the  previous  one.   Figure  1  shows  a  small  example  of  the  result 
for  5  objects. 

The  tree  structure  in  Figure  1,  called  a  dendrogram,  indicates 
how  this  procedure  formed  the  groups  of  clusters.   The  order  shown 
here  is  not  unlike  the  groupings  which  occur  in  biological  taxonomy, 
where  all  life  forms  are  grouped,  first  into  species,  then  into 
genera,  then  into  families,  and  so  on.   This  method  may  appropriately 
be  called  numerical  taxonomy. 

The  dendrogram  in  Figure  1  shows  the  5  individual  objects  being 
grouped  into  two  groups,  objects  1  and  2,  and  objects  3,  4,  and  5. 
This  is  the  first  grouping  beyond  the  base  level  of  5  singleton 
groups.   A  more  coarse  grouping  brings  all  5  objects  into  a  single 
set.   The  distance  scale  provides  a  measure  of  selectivity  in  forming 
the  groups.   If  the  "distance"  allowed  between  objects  to  be  clustered 
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Figure  1:  A  Dendrogram  for  Hierarchical  Clustering 


together  is  10,  then  just  two  groups  are  formed.   This  criterion 
must  be  increased  to  90  before  the  first  two  groups  become  one, 
thus  indicating  that  the  cluster  of  two  groups  is  probably  natural, 
while  a  clustering  into  one  group  is  probably  not.   The  interpre- 
tation of  what  groupings  are  natural  is  somewhat  subjective  if 
based  only  on  the  dendrogram.   As  described  later,  the  clusters 
in  this  application  are  evaluated  apart  from  the  dendrogram. 

In  order  to  produce  a  dendrogram,  a  "distance"  between  each 
pair  of  ob^  3cts  must  be  specified.   In  this  application,  the  objects 
are  enlisted  Navy  ratings,  and  the  distance  between  two  ratings  should 
measure  the  proximity  of  their  loss  behavior.   The  distance  function 
chosen  for  this  purpose  is 


d(k,m)  = 


7    _  .  . 

y    P'  1(£.    -i.    r 

.  S        i,k   i,m 
i=l 


1/2 


where 


d(k,m)  =  distance  between  rating   k   and  m 
i .    ,  =  loss  rate  from  rating  k   in  year   i 

1  ,  K 

I .    =  loss  rate  from  rating  m   in  year   i 
p   is  a  parameter,   0<p<l 

and  years  are  indexed  with  1966  for   i  =  1,  1967  for   i  =  2,..., 1972 
for   i  =  7.   These  years  are  being  used  simply  because  they  comprise 
the  data  base  for  the  research  project.   The  parameter   p   is  in- 
cluded to  weigh  the  recent  years  greater.   Thus,  two  ratings  are 
judged  "close"  by  this  criterion  if  their  loss  rates  are  close, 
especially  in  recent  years.   The  specific  value  for  the  parameter 
p   remains  to  be  determined  by  the  methods  discussed  in  a  later 
section. 


Once  a  distance  between  ratings  has  been  defined,  it  is 
necessary  to  define  a  distance  function  between  subsets  of 
ratings.   This  is  necessary  for  the  hierarchical  clustering 
algorithm  to  be  defined.   While  many  definitions  of  distance 
between  subsets  are  possible,  two  were  investigated  and  one 
finally  used.   The  "maximum  metric"  is  defined  to  be  the  maximum 
of  all  distances  between  pairs  of  objects,  one  choosen  from  each 
subset.   If   C,   and   C~   are  two  subsets  of  ratings,  we  have 

dmax(Cl/C2)  =  Max^d(k/m) |keC1,meC2}  . 

The  "minimum  metric"  is  analogously  defined,  with  MIN  replacing 
MAX  in  the  above  definition. 

Under  the  maximum  metric,  two  subsets  of  ratings  are  close 
only  if  all  ratings  are  close  to  each  other.   The  minimum  metric 
only  requires  that  two  ratings  in  the  subsets  be  close,  while 
others  may  be  distant,  for  the  subsets  to  be  close.   These  two 
definitions  generate  strikingly  different  dendrogram  shapes  as 
illustrated  later. 

IV.   CLUSTERING  BY  CORRELATION 

1.   Correlating  Population  Size  and  Corresponding  Loss  Rate. 
Examination  of  the  data  on  population  sizes  and  loss 
rates  in  various  ratings  over  the  years  1966-7  2  suggested  that 
ratings  may  be  grouped  on  the  basis  of  whether  their  population 
size  correlates  positively  or  negatively  (and  to  what  extent) 
with  their  corresponding  loss  rate. 


For  example,  it  appears  that  some  ratings,  such  as  Quarter- 
master (200  QM) ,  have  their  loss  rate  increase  (or  decrease) 
together  with  their  population  size  over  the  years  1966-72.   At 
the  same  time,  other  ratings,  such  as  Construction  Recruit  (6000 
CR) ,  have  their  population  size  and  loss  rate  tend  (in  most  cases) 
in  opposite  directions  from  one  year  to  the  next. 

The  correlation  between  population  size  and  loss  rate  was 
studied  for  all  ratings  and  "All  Navy"  over  the  seven  data  points, 
provided  by  the  years  1966-72.   In  addition  to  measuring  the 
correlation  directly  for  these  data  points,  rank  correlation  was 
also  used,  since  the  actual  magnitude  of  the  changes  in  population 
size  seemed  both  unimportant  and  incongruous  when  compared  to  changes 
in  the  loss  rate. 

Two  different  rank  correlation  coefficients  were  used.   These 
(see  [1])  are  defined  below  in  terms  of  the  rankings,  P,,...,P_, 
of  the  seven  population  sizes,  over  the  years  1966-72,  of  a  given 
rating  and  the  rankings   £,.,...,£_   of  the  seven  corresponding 
loss  rates. 

(i)  Spearman's  Rho: 

Let  D.  =  P.  -    i.     ,   i=l,...,7 

ill 

be  difference  in  the  rankings. 

1   7   2 
Then  p  =  1  -  ■£?      J  D. 

56  i=l  X 


(ii)  Kendall's  Tau: 


/  +1   if   (P,-P^  U,-^)>0, 


l  3'       l  j 

Let      A..={  i,j=l,...,7 

ID 

I  -1   if   (P. -P.)  (I. -I. )<0 
i   D    i  3 


Then  t  =  21   I    I   Aii 

zx   l£i<j*7   1D 

(  iii)  Ordinary  Correlation  Coefficient: 

If   P.   and   £.   denote  the  actual  magnitude  of  the  population 

sizes  and  corresponding  loss  rates  respectively  of  a  rating  over  the 

years  1966-72,  the  correlation  coefficient  is  defined  as 

7 

I       (P,-P) (£•-£) 

i=l    x      1 

r  =  — s 


7         9   7  1/2 

I   (P.-P)^  I       (iTl)Z 
1=1  1=1 

where  ,   7         _   _   7 

P  =  =■     I      P.  and  £  =  4  T  £. 
7  i=l   x         7  i=l  x 

Each  of  these  correlation  coefficients  provides  a  method  of 
clustering  of  ratings.   Kendall's  Tau  seemed,  perhaps,  the  most 
accommodating  in  providing  clusters  that  separate  in  a  somewhat 
natural  way.   Thus,  three  clusters  may  be  formed  on  the  basis 
of  the  values  of  Kendall's  Tau: 

(i)   Ratings  with   -1.00  £  t  £  -  0.13  (Cluster  A) 

(ii)   Ratings  with   -0.13  <  x  <  +  0.50  (Cluster  B) 

(iii)   Ratings  with  +0.50  i  t  i  +  1.00  (Cluster  C) 

Table  1  shows  a  histogram  of  loss  rates  for  ratings  against  their 
T-values.   Each  of  the  three  clusters  may  be  broken  into  further 
subclusters  in  various  ways  based  on  the  loss  rates  of  the  ratings 
in  each  cluster.   Such  methods  are  suggested  in  the  next  subsection. 
2.   Correlating  Loss  Rates  with  All  Navy  Population  Size. 
If  the  above  procedure  for  clustering  ratings  is  to  be 
useful  it  should  provide  a  procedure  for  forecasting  future  loss 


i 

I 


4 


it 


4T 


sjjt 


3*  •  * 
*  «* 


##*### 


»«* 

*«  « 
*## 

«*« 
*»* 
#*» 


*  *•* 
*«  * 
*«  * 

«** 
#«  * 

*  *  * 


I 

♦■ 
I 
I 
I 

♦ 
«*# 
*## 
♦  #* 
♦  ■ 
*#  * 
#** 
**« 


-o 
o 


««  • 

**  « 

*«* 


*  *  * 

*  *  * 

¥   *  # 


**  * 
##« 
««  « 

♦ 

*    »  * 

♦  - 
I 


I 


3**# 


'#■»#■ 


*  *  #•» «*  *  #  * 


4«<»«*#-a-**«-»»**--»««»f«*»».««««*ii 


o 

1 
1 
♦- 

(NJ 

—    • 

»*»**«««  *  »  *  «  k  »  »  *  »  » 


+ 
♦  *■♦♦■**■*»# 


-4*  fc#  *  *•  * 

4  *  *  #  »  *  « 
8*  **  «  ** 


t * ►••••• 


«  «  *  »  * 


*  8  «  ~  ' 


*  *■  *-  «•  «  « 
s  *  ►  *  *  * 

♦  *  *•;:♦« 

k 


«#« 


*  »  *  «  v  * 
«  *  »  *  l  * 

k  i  «  *  it  * 


*  4  *   '«  V  * 

,.  4  »  < .  *  «     u 

+  — 


r.  ■*   H  it  t    <  J  •  K  r  M  S 

a  »  .  i  *  J  .•.  s  f  i  »  « 

„  r  .  v   -  ¥   M  r:   «  >  »■  ■.  I" 


.1  r      •    -      - 

!J  I.  »•   I     •    .    ■■• 


„  *  f  *  *  „  * 


! 

o 
1 

o 

4 

LL 

«  ij  r  »  f  " 

!  O 

CD 

'  O 

.  a  k  »  f  * 

"1 

,  o 

*■- 

—     • 

o 

;:  J  .  i,  h 

o 

o 

»3»Hf 

1 

1  o 

>  a  ;  s  k  * 

. 

+  — 

o 

i 

IT 

.  C? 

♦- 

—    • 

1  t- 

:  A  1  5  C  - 

t.    #    *     K     k     « 

o 

1 

o 
o 

4 

.  IU 

■^ 

*^ 

^' 

1  o 

+- 

—    • 

o 

i 

o 

1  <J 

.    „    ll    «    «    «    ► 

+ 
I 


o 

I  u. 


pocoooo 

I.   I    I    I    I    I 

UJUJUJUJU-UJU: 

f^O-JOlfNOO 

I'  t  I 


—  z— 

uj-3  uli 

■-.in— 

in 


LUUJUJ^JUJ 


3       z?rz? 
m     -t—KMinr-ws 

Q       ^    ....    .T 


N 

a 


-^ 

pooooo 

1- 

J! 

1    1    1    1    1    1 

UJ 

LU  LULU  LULL  L1J 

X 

□ 

2 

OtO"4^"4 

niOMNON 

_J 

-icooomeiiri 

< 

QC 

•  ».... 

t- 

7? 

l!       Ill 

UJ 

U 

00  (/) 

00 — 

Li 

LU</> 

UJ 

^0_4f\J 

T 

-*(-■«< 

CT 

,     uja:f-i- 

M  >*•  ^:  3  uj  uj 

^-J 

X 

3JXi/)^3jfC 

J-iO-iO-J 

qooooo 

n^ujuja'u.'ui 

rx-o^ifv-J-r 

(<S— <Tt— IsOO 

0fTr>f-Ov0 

N>fvj-J!\jC"'~ 

k    •    •    .    «    . 

-*<f^rr\-*<t 

Q 

\m     cr>     < 

^u>3     a: 

3 

<I 

<t  3          U'O. 

tLI 

•~C      U_  XTl-^*/) 

Jt  2U'«z:o 

cr" 

a 

<<^    "  UJ<T«- 

^ 

CJOOOO 

J  LilU-UiU: 

P^Ajh-oor^ 

ruCTByjcr 

V 

0O(~SJ0L'— * 

i  ) 

rul)  j-JTivO 

2 

r^'n_iuji~ 

ILJ 

■    .    .    .    • 

Q 

-MiNKM—i^ 

^ 

li. 

1- 

1X1 

_) 

zzo 

< 

z«r 

^: 

au.-u-« 

i-- 

:r~?:::or 

rr 

■iro— no 

^j 

U>JJCti—«- 

o 

Jst-zi 

3 
< 


CO 
< 

a 

UJ 

2^ 


CD 


10 


rates  through  the  use  of  clusters.   Since  the  above  clusters  are 
obtained  by  correlating  loss  rates  of  ratings  with  the  corresponding 
population  sizes,  one  would  have  to  have  reasonably  accurate  esti- 
mates of  future  population  sizes  in  each  rating  in  order  to  fore- 
cast corresponding  loss  rates  (and  then  actual  losses) .   It  seems 
unlikely  that  such  estimates  would  be  available  for  each  rating 
and  certainly  not  several  years  in  advance.   If  good  estimates 
of  population  sizes  will  be  available  for  future  years  at  all 
it  will  be  for  "All  Navy"  only.   For  that  reason,  it  appears 
desirable  to  correlate  loss  rates  of  ratings  with  "All  Navy"  popu- 
lation size.   The  three  correlation  coefficients  defined  above 
are  again  relevant  with  the  only  change  that  P. ,  .  .  . ,P7   now  denote 
the  "All  Navy"  population  sizes,  or  their  rankings,  over  the  years 
1966-72.   Table  2  presents  the  lists  of  ratings  in  three  clusters 
formed  on  the  basis  of  Kendall's  Tau.   The  three  clusters  are: 
(i)   Ratings  with   -1.00  £  x  <  -  0.15  (Cluster  A) 
(ii)   Ratings  with  -0.15  <  t  £  +  0.25  (Cluster  B) 
(iii)   Ratings  with  +0.25  <  x  <  +  1.00  (Cluster  C) 
All  three  of  these  clusters  may  be  considered  too  big  and  in  any 
case  loss  rates  of  ratings  within  each  cluster  vary  widely.   Since 
clusters  are  envisioned  as  groups  of  ratings  of  like  loss  rates 
it  is  necessary  to  break  each  of  the  above  clusters  into  further 
subclusters.   (The  same  remark  applies  when  clustering  is  accom- 
plished based  on  correlating  each  loss  rate  with  its  own  population 
size. ) 

Further  subclusters  may  be  formed  by  selecting  one  of 
several  candidate  statistics,  such  as: 
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1966 

1967 

1968 

1969 

1970 

1971 

1972 

•  TAU 

3bOO 

SR        SEAMAN    RECRUIT                                             19.90 

16.  94 

19.80 

27.86 

29.  11 

38.70 

37.22 

-0.43 

JOI 

05       OPFRATICNS    SPECIALIST                     21.92 

28.81 

1.44 

29.25 

30.87 

3  1.  17 

31.26 

-0.43 

780( 

AP        AIRMAN     RECRUIT                                             19.13 

1  7.19 

13.26 

16.43 

20.50 

31.16 

32.02 

-».43 

nor 

IM        INSTRIJMCNTMAN                                          .     13.41 

22.00 

26.77 

29.93 

33.02 

36.01 

39.04 

-0.33 

750C 

AS       AV.     SUPPORT    fgUIP.     TECH. (4)1       0.0 

o.o 

15.06 

13.  15 

25.88 

26.12 

20.01 

-0.29 

5001 

FR       FIREMAN    RECURIT                                         14.86 

13.64 

25.38 

8.21 

16.92 
26.  59 

6.44 

22.02 

28.60 

4.67 

28.99 

34.08 

28.64 

-0.24 

320C 

8  5O0 

SO       STEWARD 

9.80 

5.40 

7.33 

7.12 

-0.  14 

qoc 

MN       MI  NEMAN 

9.18 

17  .6  7 

13.34 

26.47 

23.26 

30.54 

25.97 

-0.  14 

6?oo     ad     aviation  machinsts  mateui      17.47 

22.64 

22.87 

17.96 

24.62 

26.59 

24.02 

-0.14 

7  70i>       P 

T       PHOTOGRAPHIC     INTELLIGENCE       1     13.65- 

18.06 

20.57 

18.81 

36.27 

17.14 

20.  15 

-0.14 

600C 

CP       CCN.STRUCTION    RECRUIT                             8.56 

10.71 

18.12 

20.46 

38.  15 

39.28 

32.35 

-0.14 

3  30C 

OT        OENTAL     TECHNICIAN                                   lb.  75 

25.10 

23.  36 

22.00 

30.21 

26.92 

30.33 

-0.14 

LOSS    RATES 

OF    CLUSTER     B 

RATINGS 

1966 

1967 

1968 

1969 

1970 

1971 

1972 

•  TAU 

602 

l   Ml 

GUNNERS    MATE     (TECHNICIAN!                 14.16 

21.54 

18.35 

15.68 

21.98 

19.75 

23.67 

-0.05 

0 

ALL     NAVY                                                                   18.00 

20.94 

25.69 

29.46 

34.1? 

32.38 

30.86 

-0.05 

1010 

OS 

DATA    SYSTEMS    TECHNICIAN                \      20.  Id 

18.16 

11.94 

9.52 

13.23 

12.27 

13,75 

-0.05 

24"0 

SH 

SHIPS    SFRVICEMAN                                   1     16.43 

2  7  .94 

28.94 

33.13 

37.93 

34.24 

30.56 

0.05 

7600 

I'M 

PHOTOGRAPHERS    MATE                                     19.04 

24.23 

2  6.44 

21.84 

32.04 

28.52 

25.20 

0.05 

6900 

A" 

AVIATION    STRUCTURAI      MFCHI4)           15. Jl 

19.04 

21.95 

16.54 

25.35 

23.59 

20.95 

0.05 

6800 

Ah 

A VIATICN    ELECTRICIANS    MATE              17.84 

20.01 

21.54 

18.99 

25.42 

2  3.73 

20.91 

0.O5 

8000 

HM 

HOSPITAL     CORPSMAN                                        19.75 

21.76 

19.67 

19.80 

32.98 

24.98 

22.95 

0.05 

3800 

EN 

ENGI NEMAN                                                                18.16 

28.. 9t 

27.14 

27.  11 

36.99 

30.23 

32.65 

0.05 

4600 

PATTERNMAKER                                                   17.50 

23.43 

33.  88 

19.81 

33.63 

10.73 

25.00 

0.05 

7300 

l* 

AVIATION    STOREKEEPER                               19.72 

21.48 

2  1.70 

22.  28 

30.48 

J2.02 

19.M 

0.14 

3  900 

MP 

MACHINERY     REPAIRMAN 

19.  74 

30.36 

30.66 

29.  94 

36.  93 

2  9.09 

33.53 

0.14 

7000 

PC 

AIRCREW    SLlTVIVAL    EUUIPMAN 

15.57 

20.03 

16.50 

16.37 

22.88 

22.63 

19.81 

0.14 

1701 

LN 

LEGALMAN 

12.35 

12.52 

19.31 

32.86 

46.  86 

32.42 

36.  44 

8.14 

500 

IM 

TORPFOOMANS    »<AT6 

12.77 

22.  77 

21.  97 

21.19 

25.77 

2  1.59 

23.32 

0.14 

6500 

AH 

AVIATICN    ORD'NCcMAN                                  18.24 

22.77 

2  1.29 

20.23 

29.05 

23.53 

22.56 

0.14 

2700 

PC 

POSTAL    CLEhK                                                  24.98 

37.05 

38.91 

44.08 

53.77 

42.12 

40.23 

0.14 

3700 

MM 

MACHINISTS    MATE                                              17.61 

24.34 

25.48 

26.63 

24.19 

25.17 

25.90 

0.14 

2  29  0 

c  s 

C0MW1 SSARYHAN                                                   14.  44- 

2  3.04 

22.67 

24.92 

29.64 

2  4.28 

24,40 

0.14 

2600  . 

JO 

JOURNALIST                                                        25.88 

34.21 

32.02 

33.94 

41.72 

4  1.68 

38.09 

0.14 

3300 

MU 

MUSICIAN                                                                  19.27 

21.63 

13.89 

14.29 

32.56 

24.45 

18.  17 

0.14 

600 

GM 

GUNNERS    MATESI31                                           17.67 

2  5.76 

25.38 

27.27 

38.39 

28.89 

26.11 

0.14 

3100 

LI 

LITHOGRAPHER                                                 ,      30.67 

37.8* 

34.43 

33.91 

47.55 

34.43 

3«.«7 

0.14 

6600 

AC 

AIR    C 1NTRCLMAN                                                14.02 

21.64 

19.26 

17.44 

26.  59 

25.14 

21.59 

0.14 

4700 

n 

MCULDFR                                                                12.65 

26.25 

24.89 

29.91 

26.22 

24.02 

28.51 

0.24 

4200 

IC 

INTERIOR    CCMRUNICATICN    ELEC.       18.79 

27. 6» 

27.44 

28.95 

37.  10 

24.81 

29.00 

0.24 

1200 

OM 

OPTICALMAN                                                            14.53 

2  5.2-6 

26.01 

24.63 

24.70 

21.29 

24.87 

0.24 

100 

SM 

BOATSWAINS    MATE                                             17. 77 

29.55 

33.36 

37.96 

42.57 

3  3.46 

30.18 

0.24 

810 

»T 

MISSILE     TECHNICIAN 

4.90 

7.76 

11.91 

17.94 

17.  71 

10.85 

10.42 

0.24 

QM       QUARTERMASTER 


RATEJ 

i    OF    CLUSTER     C 

RATINGS 

1966 

l"67 

1968 

1969 

19  70 

1971 

1972 

22.85 

31.67 

2  8.06 

34.12 

36.  17 

32.76 

31.  19 

21.02 

25.47 

22.55 

24.75 

S5.  39 

2  3.75 

25.36 

15.65 

24.15 

21.34 

21.10 

27.74 

25.47 

20,34 

27.28 

32.16 

30.37 

29.  4f 

39.06 

40.72 

24.55 

17.20 

25.25 

26.  87 

28.74 

35.74 

27.48 

24.93 

•  TAU 
0.33 


250 
2100 
7200 
1800 


DP  DATA     PROCESSING    TECHNICIAN 

AG  AFROGR  APHERS    MATE 

AZ  AV.     MAINT.     ADf  I  NI  STRATKN 

SK  STCRFKFFPFJt 


SM 
DO 

ro 


S IGNALMAN 
OISHimSING   CL€#K 
TRAOEVMAN 


PN   PEPSONNFLMAN 


19.35 
IB.  33 
11.02 
20.31 


27.58 
i6»76 
15.40 
25.19 


27.13 
2  9.53 
19.81 
25.91 


29.81 
30.99 
19.04 
30.20 


31.54 
39.54 
25.05 
31.86 


25.80 
26,69 

13.66 
25.61 


2T.36 
26.3  7 
12.23 
22.19 


0.33 
0.33 
0.33 

0.13 


~RM  RADIOMAN  17.  79  22.99  2  2.96  26.45  2  8.59  22.95  24.24  0.33 

EM  ELFCTPKIANS     MATE  17.79  27.10  27.12  26.81  39,51  23.66  27r»8  »,5»  I 

BT  BTILEPMAN12I  20.33  30.38  27.72  31.64  32.95  26.37  31.01  0.33. 

AB  AVIATION    BOATSWAINS  KATEI4I       21.89  32.68  29.43  27.69  37.20  35.50  22.43  0.33 


0.43 
0,43 
0.43 
0.43 


4500  DC  OAMAGE    CONTROL  20.41  28.94  24.61  32.27  41.86  2  9.09  17.69  0.52 

400  ST  SONAR    TECHNICIANS!!*  17.01  23.52  2-0,83  24.32  27-.T5  15.7^  18.18  0.6? 

1000  ET  ELECTRONICS    U CHN I C I ANS ( 3  I  18.34  23.74  24.01  24.21  25.60  13.97  13.69  0.71 

800  FT  FIRE    CCNTRCL    T  FCHN I C I  ANS  (4 )  19.12  26.18  22.26  25.25  27.72  18.55  16.01  0.«0 


Table     2 
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(i)   The  mean  loss  rate  of  ratings  over  the  seven  years; 

(ii)   The  median  loss  rate  of  ratings  over  the  seven  years; 

(iii)   The  mean  or  median  loss  rate  of  ratings  over  the  last 
three  years  only; 

(iv)   The  loss  rate  of  ratings  of  the  last  year  only. 

For  demonstration  purposes,  one  of  these  statistics,  namely 

the  median  loss  rate  of  ratings  over  the  three  years  197  0-72,  was 

selected.   Figure  2  shows  each  of  the  ratings  (and  "All  Navy") 

represented  by  its  median  loss  rate  over  the  years  1970-72.   The 

three  clusters  referred  to  above  are  separated  in  the  graph.   The 

graph  itself  suggests  further  subclusters  based  on  the  size  of  the 

loss  rates.   For  example,  Cluster  A  may  be  grouped  in  four  sub- 

clusters  based  on  the  median  loss  rate   £.     of  (ii) : 

l 

(a)  Ratings  in  Cluster  A  with  0%  <  l^m'    z    20%  (A^ 

(b)  Ratings  in  Cluster  A  with  20%  <  l± (m)  £  27%  (A2) 

(c)  Ratings  in  Cluster  A  with  27%  <  I .  (m)  <  33%  (A3) 

(d)  Ratings  in  Cluster  A  with  33%  <  I .  *m)  £  100%  (A4) 
Similar  subclusters  may  be  formed  within  Clusters  B  and  C.   These 
are  indicated  in  Figure  2  by  vertical  lines  drawn  as  boundaries 
between  neighboring  subclusters. 

Shortcomings  of  this  method  are  that  it  is  quite  "ad  hoc"  in 
selecting  the  boundaries  between  clusters  and  subclusters.   Also, 
since  at  the  start  clusters  are  formed  based  on  values  of  the 
correlation  coefficients,  ratings  of  similar  losses  may  be  found 
in  separate  clusters.   Thus,  e.g.  many  ratings  in  Cluster  C  have 
loss  rates  closer  to  those  of  some  ratings  in  Cluster  B  than  those 
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of  ratings  in  their  own  subcluster.   This  may  be  regarded  as  a 
disadvantage  if  one  considered  it  an  overriding  necessity  to 
cluster  by  like  loss  rates.   On  the  other  hand,  ratings  with  similar 
loss  rates  may  be  placed  in  different  clusters,  because  these  loss 
rates  may  be  tending  in  opposite  directions  over  the  years.   It 
may  be  desirable  in  such  cases  to  group  such  ratings  separately 
despite  their  like  loss  rates. 

Because  of  the  ad  hoc  nature  of  this  clustering  method  it 
was  not  used  in  the  rest  of  this  research  effort. 

V.   EVALUATION  OF  HIERARCHICAL  CLUSTERS 

The  methods  described  above  lead  to  various  clusterings  or 
partitions  of  the  enlisted  ratings.   In  this  section,  we  describe 
how  any  such  partition  was  evaluated. 

Let  the  set  of  enlisted  ratings  be  designated   S,   where 

S  =  {1,2, . . . ,N} 

and  N   is  the  number  of  ratings  being  considered.   In  our  case, 
N  =  71  ratings.   The  total  number  of  individual  ratings  is  about 
130,  however  some  of  the  130  are  service  ratings  which  support  a 
general  rating.   In  these  instances,  several  service  ratings  con- 
tain men  specializing  in  a  similar  area,  usually  at  the  middle 
paygrades  such  as  E4  to  E6  or  E7 .   A  single  general  rating  associated 
with  these  service  ratings  contains  all  men  at  the  pay  grades  beyond 
those  of  the  service  rating,  in  the  common  area.   The  general  rating 
then  contains  the  foremen  and  line  managers  for  the  men  in  the 
service  ratings.   When  this  occured,  all  the  service  ratings  and 
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its  associated  general  rating  were  combined  into  a  pseudo  rating 

for  the  analysis.   This  avoided  having  ratings  with  only  a  few 

pay  grades.   The  common  technical  skill  areas  of  these  ratings 

made  their  prior  combination  seem  natural,  and  reduced  the  number 

of  ratings  analyzed  to  71.   A  few  recent  ratings  with  no  history 

in  our  data  base  were  left  out,  as  they  were  a  special  case  and 

quite  few  in  number.   The  following  table  shows  the  definition  of 

ratings  used  for  the  study,  with  the  actual  rating  codes  included 

in  each  of  our  ratings. 

With  the  ratings  as  defined  above,  a  partition  or  clustering 

of   S   is  a  set  of  subsets   C,   of   S   for  which 

k 


C.  0  C.  =  0     if   k^j 


uc.  =  s 

k  K 


If  there  are  m   subsets   C  (k=l , . . . ,m) ,  the  partition  is  said 
to  be  of  size   m.   Many  partitions,  suggested  primarily  by  the 
hierarchical  clustering  method,  were  evaluated  by  a  method  des- 
cribed below. 

This  research  investigation  was  conducted  for  the  express 
purpose  of  finding  out  if  the  prediction  of  losses  by  forecasting 
loss  rates  could  be  improved  when  data  was  pooled  among  ratings  in 
clusters,  for  some  systematically  well-defined  clustering.   The 
approach  was  to  forecast  losses  by  a  method  approximating  the  one 
actually  used,  and  for  which  the  clustering  was  originally  intended 
The  forecasting  was  done  for  the  year  1973  (fiscal  year)  ,  usir.a 


Index  in 

S 
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4 

5 
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7 

8 
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14 
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16 
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18 

19 

20 

21 

22 

23 

24 

25 

26 

27 

28 

29 

30 
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RATINGS  USED  IN  THE  STUDY 

Name  Rating  Codes 

Boatswains  Mate  100 

Quartermaster  200 

Signalman  250 

Operations  Specialist  300 

Sonar  Technicians  400,  401,  404 

Torpedomans  Mate  500 

Gunners  Mates  600,  601,  604 

Gunners  Mate  Technician  602 

Fire  Control  Technicians  800,  801,  802,  803 

Missile  Technician  810 

Mineman  900 

Electronics  Technicians  1000,  1001,  1002 

Data  Systems  Technician  1010 

Instrumentman  1100 

Opticalman  1200 

Radioman  1500 

Communication  Technicians  1600,  1611,  1622, 

1633,  1644,  1655,  1666 

Yeoman  1700 

Legalman  17  01 

Personnelman  1800 

Data  Processing  Technician  1900 

Storekeeper  2000 

Disbursing  Clerk  2100 

Commissaryman  2290 

Ships  Serviceman  2490 

Journalist  2600 

Postal  Clerk  2700 

Lithographer  3100 

Illustrator  Draftsman  3200 

Musician  3300 
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Index  in  S  Name  Rating  Codes 

Seaman  Recruit  3600 

Machinists  Mate  37  00 

Engineman  38  00 

Machinery  Repairman  3900 

Boilerman  4000,4020 

Electricians  Mate  4100 

Interior  Communication  4200 
Elec. 

Hull  Technicians  4300,  4410,  4411,  4412 

Damage  Control  4500 

Patternmaker  4600 

Moulder  47  00 

Fireman  Recruit  5000 

Engineering  Aid  5100,  5101,  5102 

Construction  Electrician  5300,  -1,  -2,  -3,  -4. 

-5,  -6 

Equipment  Operator  5410,  5411,  5412 

Construction  Mechanic  5500,  5503,  5504 

Builder  5600,  5601,  5602,  5503 

Steel  Worker  5700,  5703,  5704 

Utilitiesman  5800,  5801,  5802,  5803, 

5804 

Construction  Recruit  6000 

Aviation  Machinists  Mate  6200,  6205,  6206 

Aviation  Electronics  6300,  6304,  6306,  6307 

Technician 

53  Aviation  Antisub  Warfare  6310 
Technician 

54  Aviation  Ordanceman  6500 

55  Aviation  Fire  Control  6520,  6521,  6522 
Technician 

56  Air  Controlman  6600 

57  Aviation  Boatswains  Mate  6700,  6704,  6705,  6706 

58  Aviation  Electricians  Mate  6800 

59  Aviation  Structural  Mechanic  6  90  0 ,  6901,  6902,  6903 

60  Aircrew  Survival  Equipman  7  000 


31 

32 

33 

34 

35 

36 

37 

38 

39 

40 

41 

42 

43 

44 

45 

46 

47 

48 

49 

50 

51 

52 

Index  in  S 

61 

62 

63 

64 

65 

66 

67 

68 

69 

70 

71 
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Names  Rating  Codes 

Aerographers  Mate  7100 

Tradevman  7  200 

Aviation  Storekeeper  7  300 

Aviation  Maintenance  Admin. 7  400 

Aviation  Support  Equip.  7500,  7501,  7502,  7503 
Technician 

Photographers  Mate  7  600 

Photographic  Intelligence   77  00 

Airman  Recruit  7  8  00 

Hospital  Corpsman  8  000 

Dental  Technician  8300 

Steward  8  500 


TABLE  3 
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data  in  the  years  1966-72.   Then,  the  predicted  losses  were 
compared  to  the  actual  losses  in  197  3.   The  prediction  scheme 
was  not  detailed  enough  to  be  used  for  actually  forecasting 
losses,  and  was  only  intended  to  be  an  evaluation  of  clustering. 
If  clustering  is  to  improve  significantly  the  forecasting  (by  any 
means) ,  then  it  should  improve  forecasting  by  the  elementary 
prediction  scheme  given  below. 

To  evaluate  any  clustering  or  partition   C,  ,  k=l,...,mf 
the  following  approach  was  used.   First,  a  projection  of  total 
losses  was  made  for  each  individual  rating  by  projecting  the  loss 
rate,  i.e.,  the  proportion  of  those  on  board  at  the  year's  start 
who  would  be  lost  over  the  year.   Let 

I .  ..  =  Inventory  (of  men)  at  the  beginning  of 
year   i,  in  rating   j. 

L.    =  Losses  during  year   i   from  rating   j. 
where  the  indices  are, 

i  =  1,2,.. .,7   for  years  1966,  1967  ,...,  1972 
respectively,  and 

j  =  1,2, . . . ,N  . 
The  estimated  loss  rate  in  1973  for  rating   j ,   denoted  I.     , 
was  obtained  from  a  weighted  average  of  the  actual  loss  rates 
in  prior  years.   Specifically, 


7 

7-,,.    .. 


I    .        = 


I       a7_i(L.   .M. 
i=l        x'3   l' 


j      y   *-i 

L       a 
i=l 

where   a   is  a  fixed  weighting  factor,   0  <  a  <  1.   This  estimated 
loss  rate  was  applied  to  the  1973  inventory   I.,   yielding 
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L.  =  I.     •  I. 

:    3    3 

as  the  estimated  loss  from  rating   j   in  1973,   using  no  clustering 

The  same  prediction  scheme  was  used  with  clustering,  and  both 
predictions  were  compared  to  the  actual  loss.   To  estimate  the 
loss  rate  with  clusters,  let   C,  k=  l,2,...,m  be  the  partition  of 
the  ratings  being  considered.   Then,  pooling  data  over  clusters 
gives  the  formula  for  the  common  estimated  loss  rate  of  ratings 


in  cluster  C,  : 
k 


7   7-1 


i=i      icc_  i'j  ik_ ^ 


£.  = 


D€Ck     D€ck 


i=l 

for  every   j  c  C,  .   Then  the  estimated  loss  is 

L.  =  I.    -    I. 

3     3     3 

It  should  be  emphasized  again  that  the  prediction  scheme  used 
here  is  not  intended  to  be  the  best  available  for  the  data  at  hand 
Our  purpose  is  only  to  evaluate  the  clustering,  by  comparing  loss 
predictions  with  and  without  clustering,  using  the  same  prediction 
scheme  in  both  instances. 

VI.   RESULTS  OF  CLUSTERING  EXPERIMENT 
1.   Dendrograms. 

Using  the  distance  function  defined  in  Chapter  III,  two 
dendrograms  were  drawn  for  each  of  several  values  of  the  weighting 
factor   p  .   The  two  dendrograms  correspond  to  the  maximum  and  the 


21 


minimum  metrics,  respectively,  between  clusters  as  defined  in 

Chapter  III.   Figures  3  and  4  show  examples  of  dendrograms  with 

the  minimum  and  maximum  metric  respectively.   An  undersirable 

feature  of  all  dendrograms  with  the  minimum  metric  is,  as  can  be 

seen  in  Figure  3 ,  that  separation  into  clusters  does  not  occur 

until  sets  are  at  a  fairly  close  "distance"  to  each  other.   For 

example,  in  Figure  4,  although  two  clusters  form  at  a  "distance" 

of  15.60,  the  next  separation  into  (three)  clusters  occurs  at 

a  "distance"  of  3.12.   Further  separations  occur  at  very  short 

intervals,  at  "distance"  values  2.25,  1.692,  1.688,  etc.   This 

makes  it  rather  difficult  to  decide  on  the  number  of  clusters  to  be 

used.   In  contrast,  Figure  4  shows  a  typical  dendrogram  with  the 

maximum  metric.   Here  separationr  into  clusters  occur  quite 

gradually  at  least  until  about  ten  clusters  have  formed.   Separation 

into  two,  three,  four,  etc.,  clusters  occur  at  'che  "distance" 

values  48.7,  29.9,  18.2,  14.3,  9.4,  7.6,  etc.   This  provides  more 

justification  to  choose  e.g.,  tour  clusters  rather  than  three  or 

five.   In  choosing  the  appropriate  number  of  clusters  one  must 

consider  that,  while  too  many  clusters  would  defeat  the  purpose 

of  clustering,  too  few  clusters  would  result  in  a  prediction  method 

that  is  too  crude.   For  this  reason  the  proper  choice  is  probably 

be  somewhere  between  three  and  ten  clusters. 

2.   Evaluation  of  Clustering. 

In  order  to  evaluate  the  effectiveness  of  clustering, 

the  prediction  scheme  described  in  Chapter  V  was  devised.   According 

to  this  scheme,  two  estimates,   L.  and  L . ,  were  computed  as  predic- 

1       1 

tions  with  and  without  clustering  for  the  losses  in  1973  from 
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Rating   j  .   When  the  1973  data  on  losses  became  available,  the 
actual  losses,   L.  ,  from  Rating   j   became  known.   Histograms 
were  then  prepared  for  the  following  expressions: 

(i)   L.  -  L.  =  error  in  prediction  without  clustering, 
(ii)   L.  -  L.  =  error  in  prediction  with  clustering. 

L.  -L.   -   L.  -L.l  =  difference  in  absolute 

3  3  '    '  3  3  ' 

errors  without  and  with  clustering. 

(iv)   (L .  -  h . ) v L  .  =  normalized  error  in  prediction 

without  clustering 

(v)   (L .  -  L.)^L.  =  normalized  error  in  prediction 
D     D    3 

with  clustering 

L,  -  L.-L.  -  L.  )tL.  =  difference  in  absolute 
j     3  '  '  3  3  '     3 

normalized  errors  without  and  with 
clustering. 
The  histograms  were  specifically  examined  for  cases  where  the 
number  of  clusters  was  3,  5,  1 ,    10,  15  and  20. 

The  proper  choice  of  value  for   p  ,  the  parameter  used  to 
weight  past  years  according  to  importance  in  the  clustering 
scheme  was  also  investigated.   The  value  of   p   could  be  based 
on  empirical  data  considerations.   For  example,  since   0  £  p  £  1  , 
the  larger  the  value  of   p   the  more  emphasis  is  placed  on  recent 
years  in  the  data  base.   In  this  study  the  value  of   p   to  employ 
was  based  only  on  its  effect  on  clustering.   Figure  5  shows  at 
what  level  of  the  distance  scale  various  numbers  of  clusters 
formed  as  the  value  of   p   is  changed.   This  Figure  suggests 
that  in  the  vicinity  of   p  =  .1  ,   the  points  on  the  distance 
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FIGURE  5 
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scale  where  clusters  form  are  better  separated  from  each  other 
than  is  the  case  for  other  values  of   p  . 

The  choice  of  value  for   a  /  the  parameter  that  weight  past 
years  according  to  their  importance  in  the  prediction  scheme, 
was  not  specifically  investigated.   It  seemed  natural  to  assume 
that   a  =  p  .   However,  there  could  be  convincing  arguments  for 
choosing   a   different  from   p  . 

Among  the  types  of  histograms  listed  above,  item  (vi)  was 
the  most  relevant  for  the  evaluation  of  clustering.   The  "difference 
is  absolute  normalized  errors  without  and  with  clustering"  measures 
the  relative  success  of  clustering  in  predicting  future  losses 
versus  the  success  of  doing  that  by  a  comparable  traditional 
method.   A  large  number  of  ratings  having  positive  values  for  this 
measure,  especially  large  positive  values,  would  indicate  signi- 
ficant success  of  clustering.   A  high  percentage  of  ratings  on 
the  negative  side  would  suggest  the  opposite  conclusion.   The 
actual  result,  however,  were  not  conclusive  either  way.   A  typical 
histogram  is  shown  in  Figure  6  for  the  case  is   p  =  .1   and  seven 
clusters.   The  mean  and  median  as  in  most  other  such  histograms 
are  moderately  negative,  indicating  that  the  clustering  was 
slightly  disadvantageous.   As  more  and  more  clusters  are  used  the 
histograms  become  concentrated  at  the  origin  which  is  to  be 
expected,  as  using  many  clusters  is  practically  equivalent  to  no 
clustering  at  all.   The  choice  of   p   did  not  seem  to  effect  this 
result  a  great  deal,  although  the  choice  of   p  =  .5  appeared 
to  be  slightly  more  favorable  to  the  clustering  method.   Figure  7 
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shows  the  histogram  corresponding  to  the  case   p  =  .5  and  seven 
clusters. 

The  fact  that  the  clustering  method  resulted  in  somewhat 
bigger  (absolute  normalized)  errors  than  the  standard  predicting 
method  does  not  render  clustering  totally  worthless.   Since  in 
comparison  the  two  methods  achieve  a  nearly  identical  measure  of 
success,  the  clustering  method  may  have  its  advantages  in  shortening 
the  data  processing  procedures  when  clustering  is  used.   This  may 
be  a  more  relevant  factor  when  the  forcasting  technique  is  not  of 
the  simple  variety  described  here,  but  instead  is  a  more  complex 
one  such  as  used  in  FAST  described  in  [2] ,  [4]  and  [5] . 

The  histograms  presented  above  do  not  show  the  size  of 
errors  made  by  either  the  clustering  or  the  standard  forcasting 
method.   The  histogram  presented  in  Figure  8  exhibits  the  size  of 
the  normalized  errors  when  forcasting  by  clustering  (item  (V) 
above)  for  the  case   p  =  .1   and  seven  clusters.   The  horizontal 
scale  is  in  percentage.   The  Figure  shows  that  58  of  the  71 
ratings  had  a  less  than  25%  (positive  or  negative)  error.   For  one 
rating  the  error  is  shown  as  -100%.   This  is  due  to  a  rating 
(Legalman)  for  which  there  were  zero  losses  in  1973,  while  the 
clustering  method  forecasted  464.   Since  the  zero  loss  in  1973  is 
probably  due  to  a  data  processing  error,  this  large  forcasting 
error  seems  forgivable. 

The  histograms  presented  here  are  representive  of  the  many 
more  cases  which  were  tried.   The  results  in  every  case  were 
essentially  the  same,  namely  one  of  indifference  to  clustering 
the  data  for  loss  rate  prediction.   The  number  of  subsets  in  a 
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was  explored,  as  well  as  the  choice  of  the  parameters   p   and 

a  .   The  numerous  dendrograms  and  histograms  produced  from  these 

experiments  remain  intact  with  the  authors. 

A  by-product  of  this  project  is  the  identification  of  subsets 
of  ratings  with  common  loss  behavior.   Such  a  grouping  of  ratings 
would  for  example,  sugges  guidelines  for  the  application  of  personnel 
policy  to  select  groups  of  ratings.   Other  applications  could  be 
explored  as  well  by  simply  changing  the  criterion  by  which  ratings 
are  judged  to  be  close  to  each  other.   Then  groupings  of  ratings 
could  quickly  and  easily  be  identified,  based  on  another  charac- 
teristics of  behavior  besides  loss  from  the  service. 
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